Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors

نویسندگان

  • Pangfeng Liu
  • Jan-Jan Wu
  • Chih-Hsuae Yang
چکیده

Load balancing and data locality are the two most important factors affecting the performance of parallel programs running on distributed-memory multiprocessors. A good balancing scheme should evenly distribute the workload among the available processors, and locate the tasks close to their data to reduce communication and idle time. In this paper, we study the load balancing problem of data-parallel loops with predictable neighborhood data references. The loops are characterized by variable and unpredictable execution time due to dynamic external workload. Nevertheless the data referenced by each loop iteration exploits spatial locality of stencil references. We combine an initial static BLOCK scheduling and a dynamic scheduling based on work stealing. Data locality is preserved by careful restrictions on the tasks that can be migrated. Experimental results on a network of workstations are reported.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CYCLIC: A Locality-Preserving Load-Balancing Algorithm for PDES on Shared Memory Multiprocessors

This paper presents a new load-balancing algorithm for shared memory multiprocessors that is currently being applied to the parallel simulation of logic circuits, specifically VHDL simulations. The main idea of this load-balancing algorithm is based on the exploitation of the usual characteristics of these simulations, that is, cyclicity and predictability, to obtain a good load balance while p...

متن کامل

Dynamic Scheduling on Distributed-Memory Multiprocessors

The problem of scheduling a set of applications on a multiprocessor system has been investigated in a number of diierent points of view. This paper describes our work on the scheduling problem at the user level, where we have to distribute evenly the parallel tasks that compose a program among a set of processors. We investigated dynamic scheduling heuristics applied to loops on distributed-mem...

متن کامل

Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP

ÐExploiting cache locality of parallel programs at runtime is a complementary approach to a compiler optimization. This is particularly important for those applications with dynamic memory access patterns. We propose a memory-layout oriented technique to exploit cache locality of parallel loops at runtime on Symmetric Multiprocessor (SMP) systems. Guided by application-dependent and targeted ar...

متن کامل

A Load Balancing Strategy for Iterated Parallel Loop Scheduling

An eecient template for the implementation on distributed-memory multiprocessors of iterated parallel loops, i.e. parallel loops nested in a sequential loop, is presented. The template is explicitly designed to smooth unbalanced processor workloads deriving from loops whose iterations are characterized by highly varying execution times. Experiments conducted shows performance gains w.r.t. HPF-l...

متن کامل

Cache-Affinity Scheduling for Fine Grain Multithreading

Cache utilisation is often very poor in multithreaded applications, due to the loss of data access locality incurred by frequent context switching. This problem is compounded on shared memory multiprocessors when dynamic load balancing is introduced and thread migration disrupts cache content. In this paper, we present a technique, which we refer to as ‘batching’, for reducing the negative impa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Inf. Sci. Eng.

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2002